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Suppose that F is a scalar and X is a second-order stochastic 
process, where Y and X are conditionally independent given the ran- 
dom variables ^i,...,^p which belong to the closed span Lx of X. 
This paper investigates a unified framework for the inverse regres- 
sion dimension-reduction problem. It is found that the identification 
of with the reproducing kernel Hilbert space of X provides a plat- 
form for a seamless extension from the finite- to infinite-dimensional 
settings. It also facilitates convenient computational algorithms that 
can be applied to a variety of models. 

1. Introduction. Identifying the space spanned by the inverse regres- 
sion function leads to a highly effective dimension-reduction approach for 
nonparametric regression function estimation. See Duan and Li (1991), Li 
(1991), Chen and Li (1998), Cook (1998) and Cook and Li (2002), among 
others. In this paper, we consider the approach in the context where the 
predictor is a stochastic process. Our goal is to introduce a unified formula- 
tion that can be applied to a wide variety of models, and, at the same time, 
retains the spirit of multivariate analysis so that statistical inference can be 
carried out in a natural and efficient manner. 

Let (0,.F, P) be a probability space, and let L^(0,.F, P) be the Hilbert 
space containing all random variables on (i},J^,F) that have finite variances, 
and with inner product defined by (^^, V^)L2(n,j^.p) ='E{UV). Let y be a 
random element defined on (Q,J^,P). The nature of Y critically influences 
the construction of the computational algorithms, but is of no relevance in 
the theoretical formulation of the inverse regression problem. Let {Xt,t G 
T} be a real-valued, zero-mean, second-order stochastic process defined on 
{Q,T,P), where the index set T is assumed to be a separable metric space. 
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Here, T may be quite flexible which can be a single homogeneous set or a 
union of sets with different topological nature; for example, T = {j'^^^Tq, 
where Ti = [a, b], T2 = {ti, . . . , tj}, and so on, in which case one can think of 
the restrictions of Xt to the Tq as covariates of different functional nature. 
Note that we do not assume that the paths of Xt lie in a known Hilbert 
space, which is a common assumption in functional data analysis literature 
[cf. Ramsay and Silverman (2005), Dauxois, Ferre and Yao (2001) and Ferre 
and Yao (2003, 2005)]. Indeed, in the infinite-dimensional case, such an 
assumption may be restrictive and the identification of the Hilbert space 
may pose an extra problem in practice. Eubank and Hsing (2007) contains 
a discussion on the theoretical limitations of this assumption. 

As usual, the Hilbert space of {Xt,t G T} is defined as the sub- 
space of L'^{Q,J-,P) that contains all finite linear combinations of the form 
J2i=i CiXt^ , e T, Ci e M, /c = 1, 2, . . . and their limits in L^{n, P). See Ash 
and Gardner (1975) for details of these notions. 

Define the following conditions (IRl) and (IR2) in which are 
fixed elements in L\: 

(IRl) Y and X are conditionally independent given ^1, . . . 
(IR2) For any i^L\, E(^|6, ■ ■ ■ .iv) ^ span{a, ■ ■ ■ .iv\ a-s. 

A particularly relevant model for which (IRl) holds is the multiple-index 
model 

(1) y = ^(ei,...,ep,e), 

where e is a random error independent of the process {^t}, and we call 
each an index and I the link function. The number of indices, the indices 
themselves and the link function are all assumed unknown in practice. 

Condition (IR2) holds if the joint distribution of any finite collection of 
elements from L\ is elliptically contoured, which would be the case if, for 
instance, {Xi\ is a Gaussian process. However, this could be much more 
general [see Hall and Li (1993)]. It is clear that the indices ^j's in (1) are 
nonidentifiable if I is not specified. However, the 1? subspace 

^x,e :=span{^i,...,^p} 

is identifiable. Following Li (1991), call l?-^ ^ the effective dimension-reduction 
space (EDRS) for (1). We are interested in estimating the EDRS, and in 
some situations, the link I. 

It might be awkward to conceptualize the estimation of L\ ^ directly 
since it is a space of random variables. In some cases, this problem can be 
overcome naturally. For instance, if the sample paths of Xt are contained 
in a Hilbert space H. and = {Pj,X)'n, where Pj is the representer of the 
functional, then the problem of estimating ^ can be solved by estimating 
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the space spanned by the (3j. Indeed this is the approach adopted for the 
multivariate case in Li (1991) and for the functional data case in Ferre 
and Yao (2003, 2005). See also Dauxois, Ferre and Yao (2001). However, 
as mentioned earlier, we do not assume that the sample paths of Xt are 
contained in a Hilbert space. Thus, we are interested in a natural and flexible 
representation of the ^j. Our solution is the reproducing kernel Hilbert space 
(RKHS) of Xt- It is known that the RKHS of Xt is a mirror image of in 
terms of Hilbert space structure (cf. Section 2), and so the estimation of the 
EDRS in can, in principle, be accomplished through the estimation of 
the corresponding space in the RKHS. The primary goal of this paper is to 
show how this idea can be implemented, and the advantages of the approach. 
It is interesting to note that the possibility of such an RKHS formulation 
was mentioned very briefly in Remark 2.4 of Li (1992). 

The structure of this paper is as follows. We review the basic properties 
of RKHS in Section 2. We do so out of the concern that the notion of RKHS 
is not a part of the standard statistics curriculum today and our readers 
may not be familiar with the relevant facts required in this paper. Section 3 
contains a key theoretical result on the inverse regression function E(^t|Y) 
that facilitates dimension-reduction. Estimation issues are addressed in Sec- 
tion 4, where an asymptotic theory will also be developed; the inference will 
be conducted based on the data {xi,yi), i = 1, . . . ,n, with each Xj observed 
at a finite set of points. In Section 5, we provide a number of numerical ex- 
amples, including simulation studies and a data analysis. Finally, the proofs 
are collected in Section 6. 

We should mention that the present paper focuses on the basic RKHS for- 
mulation of the inverse regression dimension-reduction problem but ignores 
many important theoretical and methodological aspects that go along with 
the formulation, such as tests for determining p in (1), choice of the number 
of slices in the sliced inverse regression procedure, estimating smooth rep- 
resenters Pj when = and so on. They will hopefully be pursued 

in future works by those that find this approach meaningful. 

2. Reproducing kernel Hilbert spaces. Since the seminal work of Parzen 
(1959, 1961a, 1961b, 1963), statistical innovations using RKHS have been 
steadily developed. See Wahba (1990), Gu (2002) and Berhnet and Thomas- 
Agnan (2004). A quick survey reveals that the notion of RKHS is now em- 
braced strongly by the machine learning community due to its importance 
in regularization problems. In this section we present the general definitions 
and common properties of RKHS required in this paper. The details of most 
of the results can be found in Aronszajn (1950). Other relevant references 
will be provided in due course. In order to be self-contained, short proofs 
are provided whenever suitable in Section 6. 
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A symmetric, real- valued bivariate function K defined on T is said to be 
nonnegative definite, denoted by K > 0, if for all n S N, ai, . . . , a„ G M, and 
ti, . . . ,tn T, we have ^27^=1 ^i^jK{ti,tj) > 0. For convenience, symmetric 
nonnegative definite bivariate functions will be referred to as covariance 
kernels below. Also, for any bivariate function write 

Kt = K{-,t). 

Definition 1. A Hilbert space H is said to be a RKHS if the elements 
of 7i are functions defined on some set T, and there is a bivariate function 
K on T xT, having the following two properties: 

(a) For all teT, KteH. 

(b) For alH G T and / G f{t) = (/, Kt)H- 

In this case, K is said to be a reproducing kernel of 7i. 

The following fundamental result is known as the Moore- Aronszajn the- 
orem. 

Proposition 1. {a) If K is a reproducing kernel of 7i, then K is a 
covariance kernel and is unique. Conversely, if K is a covariance kernel on 
T X T, a unique RHKS of functions on T with K as the reproducing kernel 
can be constructed. 

(b) // K is the reproducing kernel of the RKHS 7i, then span{Kt,t G T} 
is dense in 7i. 

Property (b) of Definition 1, called the reproducing property, is the essence 
of the notion of RKHS and will be applied extensively throughout this paper. 
The notation TCk will be used to denote the RKHS having the reproducing 
kernel K. 

An important reason why RKHS plays an important role in statistics is 
that the Hilbert space of a second-order stochastic process can be repre- 
sented by the RKHS whose reproducing kernel equals the covariance func- 
tion of the process. To see that, consider a second-order, zero-mean process 
{Xt,t G T} with covariance function R. As usual, TCr denotes the RKHS 
with reproducing kernel R. Consider the linear map ^'x from to TIr 
satisfying 

^x{Xt) = Ru teT. 

Proposition 2. is an isometric isomorphism, namely, it is one-to- 

one and satisfies {r],Cl l% = {"^ x{il),^ x{i))HR,'n,i ^ L\. 
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The mapping was introduced by Loeve (1948) and is sometimes re- 
ferred to as Loeve's isometry. For more information on the duahty between 
a stochastic process and its RKHS [see Wahba (1990)]. 

The foUowing result given in Theorem 1.1 of Fortet (1973) provides an 
insightful way to compute the RKHS norm. 

Proposition 3. A function f onT is in Hr iff 

isr=iai/(top 

(2) sup sup < oo, 

ti,...,t„ 2^^—i2_^j—iaiaji\[Zi,ij) 

where the suprema are taken over all ti, . . . ,tn £ T and all real oi, . . . , On 
for all n, such that the denominator in (2) is nonzero. If f £ Ti-K, then the 
left-hand side of (2) is the RKHS norm. 

Proposition 4. Suppose that Ki and K2 are two covariance kernels on 
TxT with K2-Ki>0. Then: 

(a) ^ Wi^i where \\f\\nK^<\\f\\HK^forfe Hri, and, 

(b) the linear operator L : TCk2 '^Ki for which 

LK2{;t)=Ki{;t), teT 
is a bounded, nonnegative definite, and self-adjoint operator on TCk2- 

Definition 2. Under the assumption of Proposition 4, we say that K2 
dominates Ki if K2 — Ki > 0, denoted by K2 > Ki , and call L the dominance 
operator of TCk2 ^^^^ T~(-Ki . If i is nuclear, or trace-class, namely L satisfies 
tr(L) < 00, we say that K2 nuclear-dominates Ki, denoted by K2 ^ Ki, 
and L is called a nuclear dominance operator. 

The trivial case when K2 = Ki can be provided as an illustration, where 
L is the identity mapping. Whether K2 ^ Ki in this case, of course, depends 
on the dimensionality of T. 

Let T be an index set and Ti C T. For any / defined on T, let stand 
for the restriction of / to the subset of Ti . 

Proposition 5. Let T he a separable metric space of which Sq = {si, S2, 
. . .} is a dense subset. Let K be a covariance kernel on T x T and Kn = 
K\SnxS„, where 5^ = {si, . . . , s„}. For any function f defined on T, write 
fn = f\sr.- The following hold: 

(a) For any function f defined on T, if for some n> 1, fn& T~^Kn' then 
fm G 'Hxm for any m<n and 

WfmWnKm - \\fn\\nK„- 

(b) Let fn G TCxn for any n, and lim„^oo ||/n||-HK„ < V either T is 
countable or both K and f are continuous functions defined on T x T and 
T, respectively, then f G Hr and ||/||-^^ = lim„_,oo WfuWut 
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3. The covariance operator of inverse regression. Below we continue to 
use the notation developed in Sections 1 and 2, and assume that (IRl) and 
(IR2) hold. As in Section 1, L\ ^ = spanj^i, . . . , ^p} denotes the EDRS of 
(1) in L\. Define the counterpart of the EDRS in 1~Lr: 

nx,e = ^x(^x,e) = span{^x(ei), • • • , ^xfe)}, 

which we call the reproducing kernel EDRS. We wish to conduct inference 
on Ti.x,e and Lj^ g. 

Denote by Zt the inverse regression process 'E{Xt\Y),t G T. Clearly, Zf is 
also a second-order stochastic process with mean 0. Denote the covariance 
function of Zt by For m = 1, 2, . . . , ti, . . . , £ T, let X = {Xt^ Xt„)^, 
Z = {Zt„..., ZtJ^ and a = (ai, . . . , amf G W^- We have 

(3) var(a^X) = var(a^Z) + E(var(a^X|y)). 
This implies that 

(4) R>K 
and it follows from Proposition 4 that 

Motivated by Theorem 3.1 of Li (1991), we make the following claim: 

(5) The sample paths of Zt are in TLx^ a-s. 

We will establish the validity of (5) in Theorem 6 below. However, let us 
first assume that (5) holds and consider some implications. Since (5) implies 
that the sample paths of Zt are in TIr a.s., we can define the covariance 
operator 




where the tensor product g ^-Hr ^ denotes the linear operator that maps / 
to {g, f)HR ■ h for f,g,h£ Hr. By the reproducing property, 

{LRt){s) = E((Z, Rt)nRZs) = EiZtZ^) = Kt{s), s,t€T, 

which implies that Im(L) = Tix and L is the dominance operator of 7iR over 
TIk [cf. Definition 2 and (b) of Proposition 4]. On the other hand, it follows 
readily from (5) that Kt G 'Hx,e for all t G T and hence 

im{L)=nK^nx,e. 

Thus, dim{Ti.K) < dim('Hx,e) = P- In particular, if Tix = T~(-x,e, estimating 
the eigenfunctions of L provides an approach for estimating TCx,e- Clearly, 
establishing (5) is crucial. 
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Remarks, (a) Note that TCk is not always equal to T~Lx,e- See Cook 
(1998) for a thorough discussion on this and related issues. Extending the 
ideas in this paper for dealing with those situations will be a topic of future 
research. 

(b) Let X be multivariate, that is, finite-dimensional, and we denote it 
by X for clarity. As before, assume that X has mean 0, covariance matrix 
R, and let Z = E(X|y). Then contains elements spanned by the column 
vectors of R, where 

(7) (f , g)wR = E(f^R~XX^R-g) = f^R-g 
and 

(8) ^x'f =(R"ffX,f gWr, 

R~ being the Moore-Penrose generalized inverse of R. Li (1991) showed 
that Z G e with probability 1, and it follows that 

(9) Lg = E ( Z (g) Z j g = E((Z, g)w^Z) = E(ZZ^)R-g =: KR'g. 

By (7) and (9) the eigenvectors of L in "Hr, are R^/^gi with the gj de- 
noting the eigenvectors of R~^/^KR~^/^ in the Euclidean space. Provided 
that = T~(-x,e, it follows from (8) that is estimated by the span of 
^'x^(R^/^gj) = (R~^/^gi)"^X. This completely agrees with the result for the 
multivariate setting described in Li (1991). 

At first glance, (5) might seem a straightforward extension of similar 
results for the multivariate case in Li (1991), or the functional case in Ferre 
and Yao (2003, 2005). However, a closer inspection reveals that this is not 
the case, and, to establish it, a deeper understanding of the relationship 
between a RKHS and the sample paths of a stochastic process is called 
for. To give an idea of where the difficulties lie, recall that Parzen (1963) 
showed that almost all the sample paths of X lie outside of TCr if T is an 
infinite separable metric space and R is continuous on T x T; for instance, 
if X is standard Brownian motion on [0,1], then the paths are nowhere 
differentiable with probability 1 but Tin contains functions with square- 
integrable derivatives. In those situations, Zt is an average of paths that 
are a.s. not in TCr, let alone TCx,e- DriscoU (1973) gave sufficient conditions 
under which the sample paths of a Gaussian process fall into a RKSH; Lukic 
and Beder (2001) provided a much more general treatment of this class of 
problems, going beyond Gaussianity. The following development is partly 
inspired by their results. 

In addition to the conditions (IRl) and (IR2) in Section 1, define the 
following condition: 
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(IRS) Either T is countable, or both R is continuous on T x T and the 
sample paths of E(-'^tl^) are continuous on T with probability 1. 

The following can be proved. 

Theorem 6. Assume the conditions (IR1)-(IR3). Then (5) holds. 

The proof of Theorem 6 will be given in Section 5. An approach for esti- 
mating L\ g by estimating L and ^ x as explained above will be developed 
in Section 4. 

To introduce sliced inverse regression (SIR) in 7^^, consider the stochastic 
process 

zf = j^{Zt\g) 

for a given cr-field Q. An example of Q is the o"- field generated by the sets {to € 
O : y € /s}, s = 1, . . . , 5, where the called slices in Li (1991), are disjoint 
sets forming a partition of the range of Y. Denote by the covariance 
function of Zf . The same variance decomposition argument in (3) shows 
that K > , and Proposition 4 implies that H^o ^ T~(-k- If the conditions 
of Theorem 6 hold, then we have 

(10) n^e c Hk ^ nx,e- 

Denote the dominance operator of Ti r over Tlj^g by , which is the covari- 
ance operator 

(11) = e[z^(^zA. 

\ Hr J 

As before, estimating the eigenfunctions of also estimates Tix^e if 'H^g = 
'Hx,e- 

4. Estimation and asymptotic theory. Assume without further reference 
in this section that the conditions (IR1)-(IR3) hold so that the conclusion 
of Theorem 6 holds. The primary goals of this section are to describe a 
procedure of estimation based on SIR, and to develop an asymptotic theory 
for the procedure. 

In view of the description in Section 3, the estimations of the covariance 
function R and inverse regression covariance function K are clearly crucial 
elements in this problem. In some cases, these could be done more efficiently 
if the precise nature of the sample paths of X is known. For example, in 
the infinite-dimensional case if the sample paths of X are m-times continu- 
ously differentiable for some m, the incorporation of the information in non- 
parametric estimation procedures may lead to a faster rate of convergence 
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in estimating R and K [see Rice and Silverman (1991), Silverman (1996), 
James, Hastie and Sugar (2000), Ramsay and Silverman (2005) and Wu and 
Pourahmadi (2003)]. However, we will not make such assumptions here, as 
our aim is to consider a general procedure whose principles and properties 
will, for a large part, transcend the detailed nature of the path properties 
of the second-order stochastic process X. Indeed, the development below 
simultaneously addresses both the finite- and infinite-dimensional cases. 

We continue to use the notation defined in Section 3. In addition, for a 
real symmetric, nonnegative-definite matrix A, let \j{A) be the jth. largest 
eigenvalue of A; if the eigendecomposition of A is 

A = ^Aj(A)ujuJ, 

define the generalized power 

A"= \"(A)ujuJ, aeR. 

Aj(A)>0 

Note that A~^ is the Moore-Penrose inverse of A and we denote it by A~. 

We will focus on estimating ^ by SIR, that is, assume that TL^s = 
Ti.x,e, where G is the c-field generated by the sets {u £ Q:Y £ Is},s = 
1, . . . , 5, the sets /i, . . . , forming a partition of the range of Y with 

Ps := P{Y G Is) > for each s. 

Note that = E{E{X\Y)\g) = E{X\g), so 

= E(E{X\g)<^E{X\g)\ =Y.Pshs(S)hs, 

\ Hr / s=l Hr 

where 

(12) hs = EiX\YGQ. 

To fix ideas, let the eigenvalues of be distinct; for 1 < j <p, let fj denote 
the eigenfunction corresponding to the jth largest eigenvalues of L^, and, 
without loss of generality, let = ^^^(fj). 

Let {Yi,Xi^t)A ^i^n, he n i.i.d. realizations of {Y^Xt). However, we 
only observe Yi,Xi^tj ^n^l < j < for some finite Jn- Let 

X, = (X,,t,,...,X,,t,J^ and h,=E(Xi|yiG/,) 
and, for each J, 

(13) Rj = {E(Xi,i,Xi,tJ}4.=i. 

Estimate ps, and Rj„, respectively, by the empirical estimators 

Ps = ltmehl ^s = ^k^^^ and R„,.„ = if:X.Xr 

^ i=l Z^i=l l-'i t -'sj ^ i=l 
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If Xi is not centered, then we need to center where appropriate in and 
Rn,j„. As mentioned in the beginning of this section, in practice, if X has 
smooth paths, then incorporating the information in the estimation of 
and Rj may lead to estimators that are more efficient than the naive ones 
defined here. 

For k < J, let Pj^k and Pn,j„,k be the projection matrices onto the 
eigenspaces of the first k eigenvalues of Rj and Rn, j„ , respectively; let 

(14) Rj^fc = Pj^fcRjPj^fc and Rn,J„,fc = Pn,J„,fcRn,J„Pn,J„,fc- 

Our proposed estimator of is 

(15) Cn,k,j = {Xt^ Xtj^ )^n,J„,k^j =■ i^h , • • • , Xtj^ )Pn,k,j, 

where vj is the eigenvector corresponding to the jth largest eigenvalue of 

(16) M.,, := K-^'^, (r^Pshsh^^ K-X'k 

in M"^". Note that X in (15) is a generic process whose sole purpose is to 
facilitate the definition of the estimator ^n,k,j- Below we will investigate the 
convergence of Cn,fc,j to in L^. 

Remarks, (a) The procedure described above is not entirely new. If Xi 
is finite-dimensional with J„ = J, then taking k = J reduces the procedure 
above to that in Li (1991). In the infinite-dimensional case, the estimation of 
the eigenspaces of Rj„ corresponding to small eigenvalues is typically unsta- 
ble, in which case k acts as a smoothing parameter that controls the trade-off 
between bias and variance. Chiaromonte and Martinelli (2002) considered a 
similar approach in the context of analyzing gene-expression data. 

(b) Ferre and Yao (2003) assume that the paths of Xi are in a known 
Hilbert space Ti. Their procedure is a "continuous" version of ours, since 
they assume that functional data Xi are observed in their entirety. Of course, 
functional data are never observed in their entirety, so some kind of discrete 
approximation will have to be incorporated to implement their procedure. 
As such, there is little difference between their procedure and ours in that 
setting. 

(c) In the infinite-dimensional case, if the observational points are different 
for different Xi, then smoothing of observed data Xj becomes necessary. In 
that case, the quantities and R^ will be computed based on the 
smoothed data. The details of this will be worked out in future work. 

We proceed to explain the motivations of S,n,k,j and develop an asymptotic 
theory. Let J„ be nondecreasing and tending to some fJoo ft — > oo, where 
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Joo is assumed to be oo for the infinite-dimensional case. A related issue 
for the infinite-dimensional case is that we stated earlier that we observe 
Xi^tj, ^ 1^ i < n,l < j < Jn, at stage n, but we did not specify the manner 
in which the set of observation points ti, . . . ,tj^ change with n. There are 
two options in that regard. The first one is to consider the fully general case 
where each tj actually also depends on n so that tj = tn.j- Another option 
is to consider a nested sequence of sets 

rj:={ti,...,tj}, J>1, 

where the tj do not depend on J, so that more observation points will simply 
be added to each Xi as n increases. As far as the proofs go, the two cases 
require similar arguments. However, since the nested-sequence assumption 
entails slightly simpler details and much cleaner notation, we will take that 
approach. 

First define two technical conditions, both of which amount to requiring 
that the leading eigenvalues of Rj dominate the rest. The first condition is 

(17) lim limsuptr((Rj — Rj^)Kj) = for some < Joo-, 

where Kj = {K{ti,tj)}f j^^, and Rj and Hj^k are as defined in (13) and 
(14), respectively. It is shown by Lemma 12 below that, under very general 
conditions, lim j^j^ tr(RjKj) < oo, which can be shown to be the trace of 
the dominance operator from TCr to TCk- In that light, the condition (17) is 
quite mild. 

To motivate the second technical condition, observe that if < inf^ E(^f ) < 
sup^E(^t^) < oo, then 

J 

(18) tr(Rj) = ^ A,(Rj) = E{Xl) = 0{J). 

j>i i=i 

If the random variables Xt . ,1 < i < J , are uncorrelated, Rj is a diagonal 
matrix and all of the eigenvalues are bounded away from and oo. In the 
infinite-dimensional case, we wish to avoid this type of situation and focus 
on those where the strength of dependence among the Xt^ increases as J 
increases, so that the leading eigenvectors of Rj dominate. In that case, 
gaps of size 0{J) can be expected to exist between leading eigenvalues. The 
second technical condition is, for m equal to a fixed positive integer, 

(19) liminf^^^^^>0, 
where 

(20) p^(Rj) =min{|Aj(Rj) - A^(Rj)| : Aj(Rj) / A„(Rj)}. 

Indeed, the conditions (17) and (19) are extremely general, as reflected by 
the following result. 
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Proposition 7. Let T = [a, b] be any compact interval, t, . . . ,tj be equally 
spaced in T. If the covariance function R is continuous onT x T, then (17) 
holds with koo = Joo =oo. //, additionally, the multiplicity of \m{Q) is 1, 
where Q is the integral operator Q-f ^ f^R{-,y)f{y)dy,f^Lp'\a^b], then 
(19) holds as well. 

The first step in establishing the estimator (15) is to compare with 
the following operator: 

s=l Hr 

where 

(21) hs = hs,n,k = {R{-,h), R{-,tj„))-Rj^fSis. 
Let II • I loo denote the sup or uniform norm of an operator. 

Lemma 8. Assume that either T = [jyL^Tj , or U jLi is dense in T 
and R is continuous onT xT. Also assume that (17) holds. Then we have 

W^n k ~ ^^\\oo — ^ as n —> OO and then k fcoo- 

^ Under the conclusion of Lemma 8, the eigenvalues and eigenfunctions of 
converge in probability to those of , where convergence of the eigen- 
functions is in terms of the norm of Tin- The convergence of the eigenvalues 
follows from Corollary 4 on page 1090 of Dunford and Schwarz (1988). The 
convergence of the eigenfunctions follows from the convergence of projec- 
tion operators of eigenspaces, which can be established as in Gohberg and 
Krem (1969), page 15 [see also Dauxois, Pousse and Remain (1982), pages 
141-142]. 

Next, we express the eigenproblem of ^. as an eigenproblem in M"^. 
Lemma 9. Let (Aj,Uj) be the eigenvalues and eigenvectors of 

(22) M„,, := ^-'J^ (y^Ps'^S^^ n-jl/^ 

mM"^". Then, for each j , Xj is an eigenvalue ofL^f^ and {R{-,ti), . . . ,R{-,tj^)) x 
Tij^^Uj is the corresponding eigenf unction. 

Thus, under the assumptions of Lemma 8, 

(23) ||(i?(-,ti), . . . , R{-^t.j„))n-l[^vij - f,\\HR ^ 

as n — > oo and then k — > /cqo • 
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Let 

Since is an isometric isomorphism (Proposition 2) and = fj, (23) 

is equivalent to 

(24) ll'?n,A;j — IIl2^ as n — > oo and then k koo- 

However, since R is unknown, in,k,j cannot be directly used for inference. 
Intuitively, (Aj,Uj) in Lemma 9 can be estimated by the eigenvalues and 
eigenvectors of M„^fc in (16). The following result provides the justification. 

Lemma 10. Assume that supjE(^^) < oo. Also assume that Jn = o{n) 
and (19) holds for m = k, a fixed positive integer. Then 

(25) ||M„,fc - M„,fc||oo ^0 asn^oo. 

Also if vLj and vj are the eigenvectors corresponding to the jth eigenvalues 
ofWln^k CLnd'M.n.k, respectively, we have 

(26) Un,k,j - ^n,k,j\\Lj^ as n ^ oo. 
Combining (24) and (26), we have: 

Theorem 11. Assume that sup^E(^t') < oo, J„ = o{n), and either T = 
IJjLiTj, or [jy^iTj is dense in T and R is continuous on T x T. Also 
assume that (17) holds, and that (19) holds for all m G = {/ci, ^2, . . .} 
where k^ koo- Then for each j , 

(27) ll'^n.fcfj — as oo and then oo. 

Remarks, (a) The interpretation of (27) is that, under the assumptions 
of the theorem, there exists a sequence in such that in,ki^,j ij L\. 
In reality, k^^^ is picked so that Rn,Jn,A:£„ and R-^j^^,^ estimate Rj„ and 
Rj^ fc^ ) respectively, well. 

(b) We conjecture that the assumption J„ = o(n) can be considerably 
relaxed. The assumption is needed because, in our proofs, we bound the 
distances between certain operators using the Hilbert-Schmidt norm. To 
relax the condition requires a different approach of bounding those distances, 
which is beyond our reach at this point. 
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5. Numerical examples. We now demonstrate the methodology in Sec- 
tion 4 with some numerical examples. Examples 1 and 2 are based on com- 
puter simulations, and Example 3 contains an analysis of real data. 

In order to implement the methodology in Section 4, we need to know 
how to choose k in £,n,k,j- Recall that k is a smoothing parameter which 
controls the bias/variance trade-off. Also recall from the asymptotic theory 
[cf. (17) and (19)] that our procedure is designed to deal with situations 
where the effective dimension of the data is much smaller than J„, the actual 
length of the data vector. In practice, k can be chosen subjectively to ensure 
that J2j=i ^j(^n,j„) / J2aiij ^j(^n,j„) IS closc to 1, and yet the eigenvalues 
Aj(Rn,j„), I < j < k, are not "too small." However, the following data-driven 
procedure for choosing k may be useful. Consider the model 

(28) Y = i{^,,...,^p) + e 

and assume that i is smooth. For each feasible k and i = 1, . . . ,re, we leave 
out {'x.i,yi) and use the rest of the data (x[_j],y[_j]) to compute the S,n,k,j in 

(15) and nonparametrically estimate i; use the Cn,k,j, the estimated i, and 
Xj to compute a predicted value yi^k] let CV{k) = J2?=iiyi ~ Vi^kY-, aiid pick 
k to minimize CV{k). Instead of leaving one datum out at a time, given 
enough data, we can also divide the data into training and testing samples 
in computing CV] see Example 3. These cross-validation procedures are not 
ideal since we need to know p in advance, and the nonparametric fitting 
adds an extra layer of complication. A more satisfactory procedure that is 
free of these problems is currently not available. 

The number of slices S in SIR is another issue. However, it is a relatively 
minor one which usually does not change the outcomes of the analysis in a 
big way. We let S = 10 in all of the following examples. 

Example 1. Let {X(t),t G [0, 1]} be a standard Brownian motion, e ~ 
iV(0,0.32), and 

Y = exp (^j^ (3{s)X{s) ds^ + e, 

where (3{s) = sin(37rs/2). Hence ^ = Jq P{s)X{s) ds. A sample of n = 100 
i.i.d. {xi,yi) were generated, where each was observed at 100 equally 
spaced time points in [0, 1]. The first five eigenvalues of the sample covari- 
ance R„^j„ are 35.17, 4.06, 1.65, 0.75 and 0.54 compared to the first five 
theoretical eigenvalues 0.405285, 0.045031, 0.016211, 0.008271, and 0.005003 
of the Brownian motion in L^[0, 1]. The amounts of variation in the sample 
explained by the first five eigenvectors of the sample covariance cumulatively 
are 0.80, 0.89, 0.93, 0.94 and 0.96. The cross-validation procedure described 
in the beginning of this section selected k = 2. The plots of /3, ^ versus ^, 
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and y versus ^ are displayed in Figure 1. See (15) for the definitions of (3 
and ^. We also estimated the link function I by smoothing spline, which is 
displayed along with the plot for y versus ^. It is not surprising that (3 is not 
smooth since no smoothing took place in computing it. If desired, a smooth- 
ing procedure can be incorporated in the eigendecomposition of IVEy^^A: [see, 
e.g., Silverman (1996)]. Note that the results presented are based on one 
single simulation run. However, the quality of the estimates, especially for I 
and ^, is largely representative of what is obtained in repeated simulations. 
In particular, the sample correlations were seen to be averaging over 0.98 in 
repeated simulation runs. 



Example 2. Consider the model in which X is a fractional Gaussian 
process on [0, 1] with self-similarity index H = 0.75 [cf. Samorodnitsky and 
Taqqu (1994)], and 

/ 32 92 \ 

Y = tan^i ^Vi2i + ^Vi2i + 

\j=30 i=90 / 

where e ~ A^(0,0.3^). Note that, in this case, ^ cannot be written as an 
L^[0, 1] inner product of a smooth curve /3 with X. A sample of n = 80 i.i.d. 
{xi,yi) were generated, where each Xi was observed at 120 equally spaced 
time points in [0, 1] . The same methodology as in Example 1 was applied, and 
the results are displayed in Figure 2. The variation in the sample explained 
by the first four eigenvectors of the sample variance Rn,j„ exceeded 99%. 
However, cross validations picked k = 8. Other simulation runs produced 
qualitatively similar results. 




o.o 0.2 0.4 o.e o.e 1 -20 -15 -10 -so 5 10 15 -20 -10 10 20 

Fig. 1. The leftmost plot is (3 (smooth curve) and [3 (nonsmooth curve) versus t, the 
middle plot is ^ versus ^, and the right plot is y versus ^. 
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Example 3. Consider a set of data recorded by the Tecator Infratec 
Food and Feed Analyzer, available at http://lib.stat.cmu.edu/datasets/ 
tecator, and which were analyzed by Ferre and Yao (2005), Amato, An- 
toniadis and Feis (2006) and Ferraty and Vieu (2006). Each food sample 
contains finely minced pork meat with different contents of fat, protein and 
moisture. During the experiment, the spectrometer measured the spectrum 
of light transmitted through the sample in the region 850-1050 nanometers 
(nm). For each meat sample, the data consist of a 100-channel spectrum of 
absorption and the contents of fat, protein and moisture. The spectral data 
are partially observed functional data, whereas fat, protein and moisture 
contents are multivariate data. The spectral data are transformed to — log^g 
of their original value. In this example, we focus on the regression of spec- 
trum X on fat content U . In accordance with the literature, we perform the 
normalizing transformation Y = log^Q(C//(l — U)). 

The sample size of these data is 240, and, as in Amato, Antoniadis and 
Feis (2006), we use the first 125 for training, and the remaining 115 for val- 
idation. The first three eigenvectors of the sample covariance Rn,j„ explain 
over 99.5% of the total variation. For different values of k, we used the first 
four of the estimated edr variables to estimate ^, where the smoothing spline 
anova function ssanova in R [cf. Gu (2002)] was our fitting algorithm. The 
validated prediction errors, {n~^ Yl^=i{yi~yi)'^^^^'^ ■• A; = 5, 21 and 25 were 
0.06842495,0.04481962 and 0.07414923, respectively, with A; = 21 achieving 
the smallest prediction error. With k = 21, the two plots on the left of Figure 
3 are the estimates of the first^two RKHS edr functions, and the plot on the 
right of Figure 3 is y := ^(^11^21^35 ^4) versus y for the validation sample. 

6. Proofs. 

Proof of Proposition 2. Consider ry = Yd^i a-iXisi),^ = J2]=i bjX{tj). 
By the reproducing property, 

m n m n 

i=l i=l i=l j=l 
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Fig. 3. The two plots on the left describe the estimated RKHS edr functions fi and f2 
for Tecator data, and the plot on the right is 'y = £((,i,(,2,^3,^4) versus y. 



which is The equahty extends readily to general random variables 

in L\ since random variables of the form 77, ^ are dense. □ 

Proof of Proposition 4. Since 



< 



Ya=i Ej=i aiajK2{ti,tj) J27=i TJj=i aiajKi{ti,tj) ' 

(a) follows at once from Proposition 3. To show (b), note that for / = 
Ya=i Ci X K2{-,ti), we have Lf = Y.'i=i CiKi{-,ti) =: fi, and hence 

(2C» II^/IIL-, _ ll/illk-, _ Er=iE^=iQC,i^i(t.,t,) ^ 

ii/iili,., ii/iiL, Er=iE"=iQc,i^2(t.,t,)- ■ 

Since the set of / of the above form is dense in H/^j' (^9) holds for all 
/ € 'Hk2- This shows that L is bounded. That L is nonnegative, and self- 
adjoint can be seen easily by the reproducing property. □ 

Proof of Proposition 5. Part (a) follows at once from Fortet's for- 
mula in Proposition 3. To show (b), we focus on the case where K and / are 
continuous. Note that for any arbitrary finite set of points, ti, . . . , t„ C T and 
constants ai,...,a„ G M such that ^^^j aiajK{ti,tj) 7^ 0, it follows from 
the continuity of K and /, together with Fortet's formula, that 



Then it is clear that lim^^oo ||/n||wA'„ is equal to the expression on the 
left-hand side of (2), and (b) follows from Proposition 3. If, instead, T is 
countable, then the above proof can be easily adapted to yield the desired 
conclusion and is omitted. □ 
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Lemma 12. Let T be a separable metric space. Assume that Ki and K2 
are covariance kernels on T x T such that: 

(a) K2 ^ Ki, and 

(b) either T is countable or K2 is continuous. 

Define a countable set Sq = {si, S2, • • •} which is equal to T ifT is countable, 
and some arbitrary dense subset ofT otherwise. Denote by L the dominance 
operator ofTi.K2 overTCKi, and, for i = 1,2, let Ki^n be the restriction of Ki 
to Sn X Sn where Sn = {si, S2, • • • , Sn}- Then we can compute tr(L) by the 
formula 

tr(L)=Jirn tr(i^i,„i^2;n)> 
where Ki^^ is the Moore-Penrose generalized inverse of K2^n- 

Proof. Let Ki^ be the restriction of Ki to Sq x Sq. We first establish 
that TiKi '^Ki are isometrically isomorphic. If T = Sq, there is nothing 
to prove. So we focus on the case where K2 is continuous and 5o is a dense 
subset of T. Note that, since K2 ^ Ki, the continuity of K2 imphes that of 
Ki. For s,s' e 5o, 

l|i^^,o(-,5)-i^.,o(-,s')llL. =II^^(-'^)-^*(-'^')IIwk. 

= K,{s,s) + K,{s',s')-2K,{s,s'), 

which tends to if s, s' both approach a fixed point t G T by continuity. By 
completeness Kifi{-,t)\sQ G '^Ki for each t £T. Then it is easy to see that 
Ti-Ki ^'^d Ti-Ki isometrically isomorphic. Thus, it suffices to prove 

tr(Lo) = Jhn_tr(i^i,„A'2~„), 

where Lq is the dominance operator of TCk2 ^^^^ T~^Ki • This follows from 
the argument below [cf. Lukic and Beder (2001)]. Apply the Gram-Schmidt 
procedure to the functions K2fi{-, Si),i = 1, 2, . . . , to obtain a CONS ei^O) ^2,0, ■ 
of 1-1x2,0 ■ Thus, 

00 

trCi^o) = ^{LQCifl, eifi)uK2,o ' 

i=l 

Let Ln be the dominance operator of 71x2 n ^A'l „ , and ej^„ be the 
restriction of Cifl to Sn. It is clear that Ci^n, ^ ^i ^n, form an orthonormal 
basis for 71x2 n- Thus, 



n n 
tT{Ln) = ^^{LnCi^n, ei,n)HK2 n ~ ^^{^Oei,Q, ej,o)-Hx2 ' 
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so that tr(Lo) = linin-»oo tr(L„). Viewing Ln,Ki^n and K2^n as matrices, it 
follows from (b) of Proposition 4 that LnKn,2 = -^n,i- Thus, L„ = 
which completes the proof. □ 

Lemma 13. Let T be a separable metric space, and let {Ut,t £ T} be 
a second-order process on T with mean and covariance function Ki. Let 
K2 be another covariance kernel on T x T such that: 

(a) K2 > Ki , and, 

(b) either T is countable, or both K2 is continuous on T x T and the 
sample paths of U are continuous a.s. on T. 

Then V{U ^Hk^) = 1- 

Proof. Let S'o, L^i,m L^2,n and L be as defined in Lemma 12. Note 
that tr(L) < 00 by the assumption K2^ Ki. Define C/„ = U\s,^- Since C/„ is 
finite-dimensional, it is easily seen that Un G Ti-Ki n a-s., which implies that 
Un £ 'Hk2 n a-s. by (a) of Proposition 4. By (7) and the property of trace, 

mUnWl^J = EiU^K^^M = E[tr(C/Jj^2-„C/„)] = EMUnU^ K^^J] 

= tr[E(t/nt/J)i^2:n] = tr(Ki,„K2-J. 

Since ||f^n||/<2n is monotone by (a) of Proposition 5, it follows from the 
monotone convergence theorem and Lemma 12 that 



(30) E 



lim 



lim tr{Ki^nK2 n) = < 



n—foo 

This implies that lim^^oo 

I|f^n|lif2n < °° which, by (b) of Proposition 
5, implies that U G Ti-K^ a.s. □ 

Proof of Theorem 6. The proof is accomplished in three steps below. 

(a) Verify that dim('H/<) <p. 

By definition L^ = spanjZj, t G T}. It follows from (IRl) and (IR2) that 
for each t £T, 

Zt = E{E{Xt\Y,Ci, . . .,(p)\Y) = E{E{Xt\^i, ...,^p)\Y) 
p 

= IIci,tEte|r) a.s. 

i=l 

for some constants Ci^f It follows that Zt G span{E(Ci|^)5 ^ = 1, . . . ,p}, t £T. 
Consequently, L| C span{E('^j|l^), i = l,...,p}, and hence dim('Hx) = 
dim(L|) < p. 

(b) Verify that Z £ TCr a.s. By step (a) and (4), we conclude at once 
that K and the dominance operator L of TCr over TCk is of finite rank 
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and hence nuclear with tr(L) < oo. Thus, the desired conclusion here follows 
from Lemma 13 under the condition (IRS). 

(c) Finally, prove that Z G 'Hx.e a.s. We will show that (Z, = a.s. 
for any h E Jiji such that 

(31) (/i,vI/xte))wB = 0, \<i<V- 

Fix such an h and let ^ = ^'^^(/i) G L\. If h = Rt, then ^ = Xt, and, by the 
reproducing property, we obtain 

{Z,h)n^=Zt = E{Xt\Y) = Em). 

Hence, in general, we have 

{Z, h)ni, = E{C\Y) for all h € Hr. 

By the properties of conditional expectation and (IRl), 

E(^|y) = E(E(e|6, ■■■,Cp, Y)\Y) = E(E(e|6, • • ■,Q\Y). 

Thus, it suffices to show that the above right-hand side equals 0, which we 
now do. Since by (IR2), E(C|Ci) ■ ■ ■ iCp) = J2^=i (^i^i some Cj, 1 < i < p, we 
have 

/ p \ 



E(E2(ei6, ...,q) = e[y. E(eiei, 



\i=l / 

= e( . . . , Cp) ) = E Q E(ee*), 

\i=l / i=l 

which, by isometry and (31), is equal to 

i=l i=l 

Thus, E(C|6, ■ • ■ ,Cp) = a.s. and therefore E(E(C|6, ■ • ■ , Cp)l>") = a.s. The 
proof is complete. □ 

Proof of Proposition 7. Without loss of generality, take [a, b] to be 
[0, 1] and, for convenience, let to = and ti = j / J, 1 < j < J. Let Rj be the 
discretized version of R: 

J 

Rj{s,t) = J2 R{U,tj)I{is,t) G [ti^i,ti) X [tj^i,tj)). 

Define the integral operator Qj : / — > Jq Rj{-,y)f{y) dy on L^[0, 1]. Note that 
Q is Hilbert-Schmidt and hence has a countable number of eigenvalues. It is 
straightforward to verify that Qj has the same eigenvalues as J~^Rj, and 
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Qj converges to Q in uniform norm. Thus, Aj(Rj) ^ JXj{Q) for each fixed 
j. Hence, (19) holds if the multipHcity of \m{Q) is 1. 

To show that (17) holds, let be the eigenvalues and eigenfunctions 

of Q. By Mercer's theorem, R{s,t) = J2i'^i4'i{s)4>iit)- Define 

^{k) = h4>i{s)(ki{t)- 
i>k+l 

For any / G Hr, write /(fc) = T.i>k+i{f Ai) L^[o,i\(t>i- It is obvious that 

(32) <||/||h«. 

(A;) 

Now we claim that 

(33) lim||/(,)||„^ =0, /GW«. 

Given e > 0, there exist some finite M and constants Cj such that the ap- 
proximation / = Y^fLi CmRi-,tm) of / satisfies 11/ — fWuR < Write 

The first term on the right-hand side is bounded by e by (32). Note that 

(/)(fc) =J2iilCmR{k){-,trn) SO that 

\\if){k)\\HR^^^ =c'^R-{fc)C^O as fc^oo, 

where R(fc) = {-R(fc)(ij,ij)}*j=i- This shows that 

limsup||/(fc)||>^ <£. 

fc— »oo * ' 

Since e is arbitrary, (33) follows. Now for any process Ut,t & whose sample 
paths are in Hr a.s. and E(||?7|||^^) < oo, by (32), (33) and Lebesgue's 
dominated convergence theorem. 



In particular. 



,lim^E(||%)||^^^^^)=0, 



where Z = J<]{X\Y). However, by the proof of Lemma 13, 

E(II%)IIh.(,,) = ii^tr((R7 - Rj^,)Kj). 
Hence, (17) holds. □ 



For each J, let Ti-Rj be the subspace of TCr spanned by R{-,tj),j = 1, . . . , 



J. Let Rj = {R{ti,tj)}j,^'^^. Each / G T~iRj can be written as / = (i?(-, ti), . . . 
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R{-,tj))c,c G M"', where, by the reproducing property, ||/|||^^ = c-^Rjc. 
Thus, without loss of generahty, write 

TiRj = {{R{;h), . . .,Ri;tj))c:ce Im(Rj)}. 

Let n J be the projection operator from Tiji into Ti-Rj . Also define the space 

Hrj, = {{R{-,h), R{; tj))c : c G Im(Rj,fe)} 

and the projection IIj^ from TCr into TCuj^k- 

Lemma 14. For any f £ Hr, and J > k>l, 

(34) nj,fc/ = ti), . . . , R{;tj))RjJ, 
where { = if ih),...,f{tj)f. 

Proof. By the reproducing property, for any a G Im(Rj^fc), 

= (/ - Uj^kf, iR{;ti), . . .,Ri;tj))si)Hn 
= (/(ti), . . . , /(tj))a - ((nj,fc/)(ti), . . . , (nj,fc/)(tj))a, 

so that 

(35) (/(ti), . . . , /(tj))a = {{Uj^kf){ti), (nj,fc/)(tj))a. 
Write 

nj,fc/ = {R{-,h), . . . , R{-,tj))c, c G Im(Rj,fc) 

and we will show that c = Rj^f . Evaluating both sides at ti,...,tj and 
pre-multiplying the resulting vectors by RJ^, we obtain 

R7,fc[(nj,fc/)(ti), . . . , {nj,kf)itj)f = R^^^jc = c. 

Since the rows of Rj^ are in Im(Rj^fc), it follows from (35) that 

RM-[(nj,fc/)(ti), . . . , i^j,kf){tj)f = KjJ. 

Hence, c = Rj^^f and the result follows. □ 

Lemma 15. Assume that either T = Uj^i ^J; or Uj^i Tj is dense in T 
and R is continuous on T x T. We also assume that (17) holds. Then, 

E ||(/ — I\.j^k)Z\\'y_^ — > as J ^ Joo and then as k ^ k^o, 

where I is identity mapping. 
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Proof. U T = [j%^Tj, then by definition = span{i?(-, tj), j = 1, 2, 
. . .}. Now suppose Uj^i Tj is dense in T, where Jqo = oo, and R is contin- 
uous on T X T. Then as in the proof of Lemma 12, for tj^ t as i ^ oo, 
the sequence of functions R{-,tj^) is Cauchy and must converge to R{-,t). 
Hence R{-,t) £ span{R{-,tj),j = 1,2,...} for each t, and we also have Tin = 
spEn{R{-,tj),j = 1,2, . . .}. Thus, in either case, we conchide 

hmsup||(/-nj)5||7^^ =0, g^HB^ 

J^oo 

and, by Lebesgue's dominated convergence theorem, 

(36) hmsupE||(I-nj)Z|||^^ =0. 

J— >oo 

Next we estabhsh 

(37) hm hmsupE||(nj-nj,fc)Z|||^^ = 0, 

which together with (36) imply the result. By Lemma 14, 

(Hj - Uj,k)Z = (i?(-,ti), . . . , R{;tj)){Kj - R7,)Z, 
where Z = (^(ti), . . . , Z{tj))^ . Hence, 

IKHj - iij,k)z\\l,^ = z^(R7 - R7,,)R./(R7 - ^-j,k)'^ 

= tr((R7-R7,)ZZ^). 
Since E(ZZ'^) = Kj, (37) follows from (17). □ 

Corollary 16. Assume the conditions of Lemma 15. Let be a a-field 
and Z^^ = 'E{Z\!F). Then 

(38) E ||(/ — Ilj^k)Z'^\\'fi^ -^0 as J ^ Jqo and then as k ^ k^o- 
For kg defined in ( 12), s = 1, . . . , 5, 

(39) II (/ — Hj^fc)/is||7^^ — > as J — > Joo and then as k ^ koo- 

Proof. Since < K, (38) follows from the same proof of the lemma 
with replacing Z everywhere. To prove (39), letting Q be the cr-field 
based on which kg is defined, we have Z^ = hg if 1" G J^. Hence, 

5 

11(1 - Uj,k)zS\\j,^ = E - ^J,k)hs\\'nJ{Y e Is). 

s=l 

Then (39) clearly follows from this and (38). □ 



24 



T. HSING AND H. REN 



Proof of Lemma 8. We will first show that 



as n — > oo and then k ^ k„ 



(40) \\hs - hsWun ^ 

Write, by Lemma 14, 

hs-hs = {R{;ti), . . .,R{;tjJ)Rj^,{hs - hs) + {Uj^,k - I)hs. 

The second term is taken care of by Corollary 16. To show that the first 
term tends to in probability in H-r, note that it is equivalent to showing 
that 



(41) 



as n ^ DO. 



Let h, = (np,)-i Er=i X,/(l^, G h)- Write 

E((h. - h,)^R7^ ,(h, - h,)) = tr[R7^^, E((h. - h,)(h, - h,)^) 
By independence, 

E((h.-h,)(h,-h,)^) 



n n 

E(X,Xj/(y, G Is)I{Y, G /,)) - h,hj 



Thus, 



(42) 



^E(XiXf/(yi G /.)) - -hsh^ < ^Rj„. 



E((h. - h,)^R7^^,(h, - h,)) < tr(R7^ ,Rj„ 

1 



which tends to as n ^ oo, since tr(R7^ ^Rj^ fc) < k. This proves (41) with 

replaced by h^. Since Ps — — *Ps, (41) follows as well. This completes the 
proof of (40). 

Next for /gWr with \\f\\H^ = l, 



hsQ^hs\f 



hs^hAf 

V Hr / 



= \\{hs,f)HRhs - {hs,f)nRhs\\nR 

< \\(hs,f)nRhs - {hsJ)nRhs\\HR + \\{hs,f)nRhs - {hs,f)HRhs\\HR 

< \\hs - hsW-HRWhsWuR + W^s - hs\\HR\\hs\\nR- 
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It follows from (40) that the right-hand side tends to in probability, and 
hence 



— > as n — > oo and then k —>■ kc 



hsQ^hs-hsQ^ hs 
Ti-R Hr 

The result follows from this. □ 

Proof of Lemma 9. Consider the linear mapping T that maps T~Crj^ j. 
to Wrj^ j. such that 

r : {R{-,ti), R{-,tjJ)c ^ Rj„c = Rj„,fcC, c e Im(Rj„,fc). 

It is easy to see that T is an isometric isomorphism. The operator in 
^R-jn fc corresponds to in Hrj^ ^ is 

L^ = Y,Pshs (g) h„ 

where = Pj„ ^hg. Representing an eigenvector of as Rj^^^c, c G Im(Rj„ fc) 
by the reproducing property, the eigenequation of is 

s 

^PsPj„,ki^si^'^'Pjn,kC = X'Rj„^kC, c^Rj„,fcC = l, cGlm(Rj„,fc), 

s=l 

which is equivalent to 

s 

Ps^jJ,k^sK^jJ,k'^ = >''^^ u u = l, c = R^^;^u. 

For a square matrix A containing complex elements, let ||^||hs = \/tr(AA*), 
where A* = . ||A||hs is known as the Hilbert-Schmidt (HS) norm or 
Frobenius norm of A. See Dunford and Schwarz (1988), page 1010, or Horn 
and Johnson (1990), Chapter 5. Note that ||A||oo < ||A||hs- 

Lemma 17. Assume that supjE(^f^) < oo. Then 

E||R„,j„-RjJ||s<C4Vn 
for some universal constant C. 

Proof. By definition, E||Rn,j„ - RjJIhs = tr(E(Rn,j„ - RjJ^)- By 
independence, 

E(Rn,j„ - Rjj' = ^(E[(XiXf )2] - E2(XiXf )) < i E[(XiXf )2]. 
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Hence, 

E ||R„,j„ - RjJI^s < iE(tr(XiXf))2 = iEdlXill^^J < C^l. □ 

Lemma 18. Assume that supjE(^^) < oo. Also assume that J„ = o{n) 
and (19) holds for m = k, a fixed positive integer. Then 

WiKJlk - Ki>j.Mks = o,(i/^). 

Proof. Our goal is to show that for any given e > there exists 6 such 
that 



UmsupP(||(R„;X', - R7;/,')Pj„,fc||HS > S/V^) < e. 



-1/2 „-l/2^ 

n^oo 

First we pick p so that 

limsupP(||R„,j„ - Rj„||hs > pJn/Vn) < e, 

n— »oo 

which is possible by Lemma 17. Below we will show that on the event 
(43) ||R„,j„-RjJ|hs<^, 



we have, for some 5, 



(44) ||(R:Y!, - R7!,ftPj„,fc||HS < ^ for large n. 



S 



n 



Without loss of generality, assume that Afc(Rj) > 0. For definiteness, 
let r be a constant satisfying < r < lim inf j^j^ Denote by i the 

imaginary unit. Let A be the rectangle on the complex plane with ver- 
tices (Ai(Rj„) + rJn) - rJ„i, (Ai(Rj„) + rJ„) + r J„i, (Afc(Rj„) - rJ„) + 
r J„i, (Afc(Rj,J — rJn) — r J„i, and let dA be the boundary of A. The length 
iidA) of dA is 8r J„ + 2(Ai(Rj) - Afc(Rj)). By (18), 

^(dA) 

(45) limsup — - — < oo. 

n— ►oo Jn 

Since pj \fn 0, by Corollary 4 on page 1090 of Dunford and Schwarz (1988) 
and (43), we have, for large n and uniformly for all j, 

|Aj(Rn,j„) - Aj(Rj„)| < ||R„,j„ -Rj„||oo< ||Rn,j„ -Rj„||hs<?'^- 

Thus, A contains Aj(Rj„) and Aj(R„^j„), but no other eigenvalues of either 
Rj,j or R„^j^ . Also A does not contain the complex origin. Let 



^J„(^) = (^-RjJ"' and ^„,j„(z) = (z-R 



•riJn 
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be the resolvent of Rj„ and Rn, j„ , respectively, where z is complex argument 
restricted to the respective resolvent sets. By the Cauchy integral formula 
[cf. Dunford and Schwarz (1988), page 568], 

- = i £ i^nM (-) - ^J. (-)] dz 

and hence 
(46) 

<^fjz\-'^' ■ \mn,J„iz) - ^j„iz))Fj„,khsdz. 

Note that lHj„ (z)Pj„^fc = (z — R j„,fc)~"^ , which is the resolvent of Rj„,fc and 
we denote it subsequently as 9\j^^k{z). Observe that 

(47) sup mj,Sz)\\h = supE k - Ai(Rjjr' < = -rr 



and 



A; 



(48) sup \\^j„,k{z)\\h = sup^ |z - \(Rjjr2 < 
Write 

^n,jJ^)Pj„,fc = (^ - Rj„ - Rn,J„ + RjJ-^Pj„,A: 

(49) = (lHj„(z)-i - R„,J„ + RjJ-'Pj„,A: 

= (I - ^j„(z)(R„,j„ - RjJ)-^^j„,,(z). 

By (43), (47), the fact that ||AB||hs ^ ||-A-||hs||B||hS) and the assumption 
Jn = o{n), 

(50) sup \\9\j„{z)(Rn,j„ - RjJIIhs < -W — < 1 for large n. 
zeA r]jn 

A standard argument [cf. (3.3) of Gohberg and Krein (1969)] shows that 

(51) (I - ^j„(z)(R„,j„ - Rjjy' = E[^Jn(^)(Rn,j„ - RjJ]^'. 

By (49) and (51), 

(^„,j„(z) - 9\j,Sz))Pj„,k = E[^^n(^)(R.n,j„ - Kjjy9\j^,k{z) 
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and by the triangle inequality and (50), 

||(^„,j„(z)-5Hj„(z))Pj„,fc||HS 

(52) 

^ ||9lj„(z)||Hs||Rn,J„-RjJ||Hs||9^J„,fc(^)||HS 

1-||^j„(z)(R„,j„-RjJ||hs 

Finally, 

sup \z\-^l^ = (Afc(RjJ - r J„)-^/2 < (Afc+i(RjJ + rJ^T^''' 

zGdA 

(53) 

<(rJ„)-^/2. 

By (46) and (52), 

< sup 1^1"'^' • ll^./.(^)llHs||Rn,J„ -RjJ|Hs||9lj„,fc(z)||HS 

from which (44) using (43), (45), (47), (48), (50) and (53). □ 
Proof of Lemma 10. Write 

= tr((Rie,.-I^J:f)^(h.hD) 

Since ||AB||hs < II A||hs||B||hS) we conclude 
ll^n,jCfc^is ~ I^j„^,fc^^islliR-^" 

(54) 

< WiKl'kK'k - Pj„,;^)llHsl|R7:i'(h.hn'/'llHs. 

We first address the second term of the right-hand side of (54). By definition 
and the same argument that leads to (42), 

(55) ||R7^\/^h,hf)i/2||2js = tr(R7^,h,h^)=Op(l). 

We next deal with the first terms on the right-hand side of (54). Write 

^-1/2 „ 1/2 _p _rR^l/2 _ ■pj-l/2.„l/2 
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It follows from Lemma 18 that 
(56) 



Op{\fjJn) = Op(l). 



By (54)-(56), we conclude that (25) holds. 
Next, 



||Cn,fcJ in,kj\\]J- 



2 

2 
X 



= i^jHk i^j - V.) - - <!k)-jf^j.,k 
X i^jllki^j - V,) - {K!Zk - ^jl!k)^j) 

<2(u,-v,-)^Pj„,fc(u,--v,-) 

+ 2v^(R„^j{f^fc - Rj^^i^)Rj„,fc(R„,j^^fc - Rj„^i^)v 
< 2\\uj - Vj||^j„ + 2||(R„_"|£^^^ - Rj^^^{.^)Pj,fc||Hsl|R-j(^fc||HS- 

The first term tends to in probability by the previous part, (25), whereas 
the second term converges to in probability as in (56). □ 
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